Goto

Collaborating Authors

 model change


Generalization Analysis on Learning with a Concurrent Verifier

Neural Information Processing Systems

Machine learning technologies have been used in a wide range of practical systems.In practical situations, it is natural to expect the input-output pairs of a machine learning model to satisfy some requirements.However, it is difficult to obtain a model that satisfies requirements by just learning from examples.A simple solution is to add a module that checks whether the input-output pairs meet the requirements and then modifies the model's outputs. Such a module, which we call a {\em concurrent verifier} (CV), can give a certification, although how the generalizability of the machine learning model changes using a CV is unclear. This paper gives a generalization analysis of learning with a CV. We analyze how the learnability of a machine learning model changes with a CV and show a condition where we can obtain a guaranteed hypothesis using a verifier only in the inference time.We also show that typical error bounds based on Rademacher complexity will be no larger than that of the original model when using a CV in multi-class classification and structured prediction settings.



Synthesising Counterfactual Explanations via Label-Conditional Gaussian Mixture Variational Autoencoders

Jiang, Junqi, Leofante, Francesco, Rago, Antonio, Toni, Francesca

arXiv.org Artificial Intelligence

Counterfactual explanations (CEs) provide recourse recommendations for individuals affected by algorithmic decisions. A key challenge is generating CEs that are robust against various perturbation types (e.g. input and model perturbations) while simultaneously satisfying other desirable properties. These include plausibility, ensuring CEs reside on the data manifold, and diversity, providing multiple distinct recourse options for single inputs. Existing methods, however, mostly struggle to address these multifaceted requirements in a unified, model-agnostic manner. We address these limitations by proposing a novel generative framework. First, we introduce the Label-conditional Gaussian Mixture Variational Autoencoder (L-GMVAE), a model trained to learn a structured latent space where each class label is represented by a set of Gaussian components with diverse, prototypical centroids. Building on this, we present LAPACE (LAtent PAth Counterfactual Explanations), a model-agnostic algorithm that synthesises entire paths of CE points by interpolating from inputs' latent representations to those learned latent centroids. This approach inherently ensures robustness to input changes, as all paths for a given target class converge to the same fixed centroids. Furthermore, the generated paths provide a spectrum of recourse options, allowing users to navigate the trade-off between proximity and plausibility while also encouraging robustness against model changes. In addition, user-specified actionability constraints can also be easily incorporated via lightweight gradient optimisation through the L-GMVAE's decoder. Comprehensive experiments show that LAPACE is computationally efficient and achieves competitive performance across eight quantitative metrics.


Optimal Robust Recourse with $L^p$-Bounded Model Change

Kyaw, Phone, Kayastha, Kshitij, Jabbari, Shahin

arXiv.org Artificial Intelligence

Recourse provides individuals who received undesirable labels (e.g., denied a loan) from algorithmic decision-making systems with a minimum-cost improvement suggestion to achieve the desired outcome. However, in practice, models often get updated to reflect changes in the data distribution or environment, invalidating the recourse recommendations (i.e., following the recourse will not lead to the desirable outcome). The robust recourse literature addresses this issue by providing a framework for computing recourses whose validity is resilient to slight changes in the model. However, since the optimization problem of computing robust recourse is non-convex (even for linear models), most of the current approaches do not have any theoretical guarantee on the optimality of the recourse. Recent work by Kayastha et. al. provides the first provably optimal algorithm for robust recourse with respect to generalized linear models when the model changes are measured using the $L^{\infty}$ norm. However, using the $L^{\infty}$ norm can lead to recourse solutions with a high price. To address this shortcoming, we consider more constrained model changes defined by the $L^p$ norm, where $p\geq 1$ but $p\neq \infty$, and provide a new algorithm that provably computes the optimal robust recourse for generalized linear models. Empirically, for both linear and non-linear models, we demonstrate that our algorithm achieves a significantly lower price of recourse (up to several orders of magnitude) compared to prior work and also exhibits a better trade-off between the implementation cost of recourse and its validity. Our empirical analysis also illustrates that our approach provides more sparse recourses compared to prior work and remains resilient to post-processing approaches that guarantee feasibility.


Explaining Concept Drift through the Evolution of Group Counterfactuals

Stępka, Ignacy, Stefanowski, Jerzy

arXiv.org Artificial Intelligence

Machine learning models in dynamic environments often suffer from concept drift, where changes in the data distribution degrade performance. While detecting this drift is a well-studied topic, explaining how and why the model's decision-making logic changes still remains a significant challenge. In this paper, we introduce a novel methodology to explain concept drift by analyzing the temporal evolution of group-based counterfactual explanations (GCEs). Our approach tracks shifts in the GCEs' cluster centroids and their associated counterfactual action vectors before and after a drift. These evolving GCEs act as an interpretable proxy, revealing structural changes in the model's decision boundary and its underlying rationale. We operationalize this analysis within a three-layer framework that synergistically combines insights from the data layer (distributional shifts), the model layer (prediction disagreement), and our proposed explanation layer. We show that such holistic view allows for a more comprehensive diagnosis of drift, making it possible to distinguish between different root causes, such as a spatial data shift versus a re-labeling of concepts.


What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns

Hedderich, Michael A., Wang, Anyi, Zhao, Raoyuan, Eichin, Florian, Fischer, Jonas, Plank, Barbara

arXiv.org Artificial Intelligence

Prompt engineering for large language models is challenging, as even small prompt perturbations or model changes can significantly impact the generated output texts. Existing evaluation methods of LLM outputs, either automated metrics or human evaluation, have limitations, such as providing limited insights or being labor-intensive. We propose Spotlight, a new approach that combines both automation and human analysis. Based on data mining techniques, we automatically distinguish between random (decoding) variations and systematic differences in language model outputs. This process provides token patterns that describe the systematic differences and guide the user in manually analyzing the effects of their prompts and changes in models efficiently. We create three benchmarks to quantitatively test the reliability of token pattern extraction methods and demonstrate that our approach provides new insights into established prompt data. From a human-centric perspective, through demonstration studies and a user study, we show that our token pattern approach helps users understand the systematic differences of language model outputs. We are further able to discover relevant differences caused by prompt and model changes (e.g. related to gender or culture), thus supporting the prompt engineering process and human-centric model behavior research.


RoFL: Robust Fingerprinting of Language Models

Tsai, Yun-Yun, Guo, Chuan, Yang, Junfeng, van der Maaten, Laurens

arXiv.org Artificial Intelligence

AI developers are releasing large language models (LLMs) under a variety of different licenses. Many of these licenses restrict the ways in which the models or their outputs may be used. This raises the question how license violations may be recognized. In particular, how can we identify that an API or product uses (an adapted version of) a particular LLM? We present a new method that enable model developers to perform such identification via fingerprints: statistical patterns that are unique to the developer's model and robust to common alterations of that model. Our method permits model identification in a black-box setting using a limited number of queries, enabling identification of models that can only be accessed via an API or product. The fingerprints are non-invasive: our method does not require any changes to the model during training, hence by design, it does not impact model quality. Empirically, we find our method provides a high degree of robustness to common changes in the model or inference settings. In our experiments, it substantially outperforms prior art, including invasive methods that explicitly train watermarks into the model.


Subgoal Discovery Using a Free Energy Paradigm and State Aggregations

Mesbah, Amirhossein, Hosseini, Reshad, Shariatpanahi, Seyed Pooya, Ahmadabadi, Majid Nili

arXiv.org Artificial Intelligence

Reinforcement learning (RL) plays a major role in solving complex sequential decision-making tasks. Hierarchical and goal-conditioned RL are promising methods for dealing with two major problems in RL, namely sample inefficiency and difficulties in reward shaping. These methods tackle the mentioned problems by decomposing a task into simpler subtasks and temporally abstracting a task in the action space. One of the key components for task decomposition of these methods is subgoal discovery. We can use the subgoal states to define hierarchies of actions and also use them in decomposing complex tasks. Under the assumption that subgoal states are more unpredictable, we propose a free energy paradigm to discover them. This is achieved by using free energy to select between two spaces, the main space and an aggregation space. The $model \; changes$ from neighboring states to a given state shows the unpredictability of a given state, and therefore it is used in this paper for subgoal discovery. Our empirical results on navigation tasks like grid-world environments show that our proposed method can be applied for subgoal discovery without prior knowledge of the task. Our proposed method is also robust to the stochasticity of environments.


Generalization Analysis on Learning with a Concurrent Verifier

Neural Information Processing Systems

Machine learning technologies have been used in a wide range of practical systems.In practical situations, it is natural to expect the input-output pairs of a machine learning model to satisfy some requirements.However, it is difficult to obtain a model that satisfies requirements by just learning from examples.A simple solution is to add a module that checks whether the input-output pairs meet the requirements and then modifies the model's outputs. Such a module, which we call a {\em concurrent verifier} (CV), can give a certification, although how the generalizability of the machine learning model changes using a CV is unclear. This paper gives a generalization analysis of learning with a CV. We analyze how the learnability of a machine learning model changes with a CV and show a condition where we can obtain a guaranteed hypothesis using a verifier only in the inference time.We also show that typical error bounds based on Rademacher complexity will be no larger than that of the original model when using a CV in multi-class classification and structured prediction settings.


Counterfactual Explanations with Probabilistic Guarantees on their Robustness to Model Change

Stępka, Ignacy, Lango, Mateusz, Stefanowski, Jerzy

arXiv.org Artificial Intelligence

Counterfactual explanations (CFEs) guide users on how to adjust inputs to machine learning models to achieve desired outputs. While existing research primarily addresses static scenarios, real-world applications often involve data or model changes, potentially invalidating previously generated CFEs and rendering user-induced input changes ineffective. Current methods addressing this issue often support only specific models or change types, require extensive hyperparameter tuning, or fail to provide probabilistic guarantees on CFE robustness to model changes. This paper proposes a novel approach for generating CFEs that provides probabilistic guarantees for any model and change type, while offering interpretable and easy-to-select hyperparameters. We establish a theoretical framework for probabilistically defining robustness to model change and demonstrate how our BetaRCE method directly stems from it. BetaRCE is a post-hoc method applied alongside a chosen base CFE generation method to enhance the quality of the explanation beyond robustness. It facilitates a transition from the base explanation to a more robust one with user-adjusted probability bounds. Through experimental comparisons with baselines, we show that BetaRCE yields robust, most plausible, and closest to baseline counterfactual explanations.